Conversational recommender systems (CRS) aim to employ natural language conversations to suggest suitable products to users. Understanding user preferences for prospective items and learning efficient item representations are crucial for CRS. Despite various attempts, earlier studies mostly learned item representations based on individual conversations, ignoring item popularity embodied among all others. Besides, they still need support in efficiently capturing user preferences since the information reflected in a single conversation is limited. Inspired by collaborative filtering, we propose a collaborative augmentation (COLA) method to simultaneously improve both item representation learning and user preference modeling to address these issues. We construct an interactive user-item graph from all conversations, which augments item representations with user-aware information, i.e., item popularity. To improve user preference modeling, we retrieve similar conversations from the training corpus, where the involved items and attributes that reflect the user's potential interests are used to augment the user representation through gate control. Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method. Our code and data are available at https://github.com/DongdingLin/COLA.
translated by 谷歌翻译
Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection. Specifically, we first detect whether a token is correct or not through a probability produced by a dedicatedly designed language model, and then design a constrained CTC loss that only duplicates the detected incorrect tokens to let the decoder focus on the correction of error tokens. Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect and thus does not need to duplicate every token but only incorrect tokens; compared with explicit error detection, SoftCorrect does not detect specific deletion/substitution/insertion errors but just leaves it to CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming previous works by a large margin, while still enjoying fast speed of parallel generation.
translated by 谷歌翻译
Query-focused summarization has been considered as an important extension for text summarization. It aims to generate a concise highlight for a given query. Different from text summarization, query-focused summarization has long been plagued by the problem of lacking high-quality large-scale datasets. In this paper, we investigate the idea that whether we can integrate and transfer the knowledge of text summarization and question answering to assist the few-shot learning in query-focused summarization. Here, we propose prefix-merging, a prefix-based pretraining strategy for few-shot learning in query-focused summarization. Drawn inspiration from prefix-tuning, we are allowed to integrate the task knowledge from text summarization and question answering into a properly designed prefix and apply the merged prefix to query-focused summarization. With only a small amount of trainable parameters, prefix-merging outperforms fine-tuning on query-focused summarization. We further discuss the influence of different prefix designs and propose a visualized explanation for how prefix-merging works.
translated by 谷歌翻译
善解人意的回应的任务旨在了解说话者对自己的经历表达的感觉,然后适当地回复演讲者。为了解决任务,必须对话的内容情绪对偶性进行建模,该对话是由内容视图组成的(即描述了哪些个人经历​​)和情感观点(即,演讲者对这些经验的感觉)。为此,我们设计了一个框架,以通过分离促进响应生成来建模内容情感二元性(CEDUAL)。有了分解,我们从内容和情感视图中编码对话历史,然后根据删除表示形式产生善解人意的响应,从而可以将对话历史记录的内容和情感信息嵌入到生成的响应中。基准数据集促进性的实验表明,cedual模型在自动和人类指标上都达到了最先进的性能,并且它还比以前的方法产生更多的促进响应。
translated by 谷歌翻译
当系统中有某些未知术语和隐藏的物理机制时,基于第一原理的复杂物理系统的管理方程可能会非常具有挑战性。在这项工作中,我们采用深度学习体系结构来学习基于从完全动力学模型中获取的数据的等离子体系统的流体部分微分方程(PDE)。证明了学到的多臂流体PDE可以融合诸如Landau阻尼等动力学效应。基于学习的流体闭合,数据驱动的多音阶流体建模可以很好地再现从完全动力学模型中得出的所有物理量。Landau阻尼的计算阻尼率与完全动力学的模拟和线性理论一致。用于复杂物理系统的PDE的数据驱动的流体建模可以应用于改善流体闭合并降低全球系统多规模建模的计算成本。
translated by 谷歌翻译
推荐系统通常会从各种用户行为中学习用户兴趣,包括点击和点击后行为(例如,喜欢和喜欢)。但是,这些行为不可避免地表现出受欢迎程度的偏见,从而导致一些不公平的问题:1)对于具有相似质量,更受欢迎的物品的物品会获得更多的曝光; 2)更糟糕的是,受欢迎程度较低的流行物品可能会获得更多的曝光率。现有关于缓解流行偏见的工作会盲目消除偏见,通常忽略项目质量的影响。我们认为,不同用户行为(例如,转换率)之间的关系实际上反映了项目质量。因此,为了处理不公平的问题,我们建议通过考虑多种用户行为来减轻流行性偏见。在这项工作中,我们研究了多行为推荐中相互作用生成过程背后的因果关系。具体来说,我们发现:1)项目受欢迎程度是暴露的项目和用户的点击交互之间的混杂因素,导致第一个不公平; 2)一些隐藏的混杂因素(例如,项目生产者的声誉)影响了项目的流行和质量,导致第二次不公平。为了减轻这些混杂问题,我们提出了一个因果框架来估计因果效应,该因果效应利用后门调整以阻止混杂因素引起的后门路径。在推论阶段,我们消除了受欢迎程度的负面影响,并利用质量的良好效果进行推荐。在两个现实世界数据集上的实验验证了我们提出的框架的有效性,这在不牺牲建议准确性的情况下增强了公平性。
translated by 谷歌翻译
随着视频数量的越来越多,对技术的需求很大,可以帮助人们迅速导航到他们感兴趣的视频片段。但是,当前的视频理解主要理解主要是视频内容摘要,而几乎没有努力,而对探索视频的结构。受文本轮廓生成的启发,我们介绍了一项新颖的视频理解任务,即视频大纲生成(VOG)。该任务定义为包含两个子任务:(1)首先根据内容结构对视频进行分割,然后(2)为每个段生成一个标题。要学习和评估VOG,我们注释了一个10K+数据集,称为Duvog。具体来说,我们使用OCR工具来识别视频的字幕。然后,要求注释者将字幕分为章节,并将每个章节分为标题。在视频中,突出显示的文本往往是标题,因为它更有可能引起人们的注意。因此,我们提出了一个视觉字幕功能增强的视频大纲生成模型(VSENET),该模型将文本字幕及其视觉字体大小和位置作为输入。我们将VOG任务视为一个序列标记问题,该问题提取了跨标题的位置,然后将其重写以形成最终大纲。此外,基于视频概述和文本概述之间的相似性,我们使用大量文章带有章节标题来预先我们的模型。 Duvog上的实验表明,我们的模型在很大程度上胜过其他基线方法,对于视频分割水平达到了77.1的F1得分,对于标题生成级别的Rouge-L_F0.5的85.0。
translated by 谷歌翻译
建议对话系统旨在与用户建立社会纽带并提供高质量的建议。本文向前迈进了一个有希望的范式,称为目标驱动的推荐对话系统,该系统备受期待尚未探索。我们专注于如何自然地引导用户通过对话逐渐接受指定的目标。为此,我们提出了一个目标驱动的对话计划(TCP)框架,以计划一系列对话操作和主题,并推动系统在不同的对话阶段之间进行过境。然后,我们将TCP应用于计划的内容来指导对话生成。实验结果表明,我们的对话计划显着提高了目标驱动的推荐对话系统的性能。
translated by 谷歌翻译
关于深度学习鲁棒性的最新研究表明,视觉变形金刚(VIT)在某些扰动下超过了卷积神经网络(CNN),例如自然腐败,对抗性攻击等。一些论文认为,VIT的优势鲁棒性来自其输入图像的分割;其他人则说,多头自我注意力(MSA)是保持鲁棒性的关键。在本文中,我们旨在引入一个原则和统一的理论框架,以调查有关VIT鲁棒性的这种论点。首先,我们首先证明,与自然语言处理中的变压器不同,VIT是Lipschitz的连续。然后,我们从理论上分析了VIT的对抗性鲁棒性,从库奇问题的角度来看,通过该问题,我们可以量化鲁棒性如何通过层传播。我们证明,第一层也是最后一层是影响VIT稳健性的关键因素。此外,根据我们的理论,我们从经验上表明,与现有研究的主张不同,MSA仅在弱对抗性攻击下有助于VIT的对抗性鲁棒性,例如,FGSM和令人惊讶的是,MSA实际上构成了该模型的对抗性稳健性,在强大的攻击下,强劲的攻击,强烈的攻击下,例如,PGD攻击。
translated by 谷歌翻译
关于多模式情感分析的现有研究在很大程度上依赖文本方式,不可避免地会引起文本单词和情感标签之间的虚假相关性。这极大地阻碍了模型的概括能力。为了解决这个问题,我们定义了分发(OOD)多模式分析的任务。该任务旨在估计和减轻文本方式对强大概括的不良影响。为此,我们接受了因果推断,该因果通过因果图检查了因果关系。从图中,我们发现虚假相关性归因于文本模式对模型预测的直接影响,而间接相关性通过考虑多模式语义来更可靠。受此启发的启发,我们设计了一个模型不足的反事实框架,用于多模式情感分析,该框架通过额外的文本模型捕获文本模式的直接效果,并通过多模型估算间接模型。在推断期间,我们首先通过反事实推断估算直接效应,然后从所有模式的总效应中减去它以获得可靠预测的间接效应。广泛的实验显示了我们提出的框架的卓越有效性和概括能力。
translated by 谷歌翻译